The Abject Failure of Keyword IR for Mathematics Search: Berkeley at NTCIR-10 Math
نویسندگان
چکیده
This paper demonstrates that classical content search using individual keywords is inadequate for mathematical formulae search. For the NTCIR10 Math Pilot Task, the authors used a standard indexing by content word for search coupled with search for components of mathematical formulae. This was followed by formula extraction from the top ranked documents. Performance was terrible, even for partial relevance. The further inclusion of some manual reformulation of topics into queries did not improve retrieval performance.
منابع مشابه
Similarity Search for Mathematics: Masaryk University Team at the NTCIR-10 Math Task
This paper describes and summarizes experiences of Masaryk University team MIRMU with the mathematical search performed for the NTCIR pilot Math Task. Our approach is the similarity search based on enhanced full text search utilizing attested state-of-the-art techniques and implementations. The variability of used Math Indexer and Searcher (MIaS) system in terms of the math query notation was t...
متن کاملNTCIR-11 Math-2 Task Overview
•Mathematics plays a fundamental role in Science, Technology, and Engineering (learn from Math, apply for STEM) •Mathematical knowledge is rich in content, sophisticated in structure, and technical in presentation! •There is a lot of documents with maths – 120.000 journal articles per year in pure/applied math, 3.5 Million overall – 50 million science articles in 2010 with a doubling time of 8-...
متن کاملThe MCAT Math Retrieval System for NTCIR-10 Math Track
NTCIR Math Track targets mathematical content access based on both natural language text and mathematical formulae. This research describes the participation of MCAT group in the NTCIR math retrieval subtask and math understanding subtask. We introduce our mathematical search system that is capable of formula search, and full-text search. We also introduce our mathematical description extractio...
متن کاملBerkeley at NTCIR-2: Chinese, Japanese, and English IR experiments
This paper reports on the work of Berkeley group at the second NTCIR workshop on Japanese & English IR and Chinese IR. A number of runs were submitted on all subtasks in the two main tasks. Our main focus on the Japanese monolingual subtask was on comparing the retrieval effectiveness of different segmentation methods. The experimental results show the bigram indexing outperformed the word-base...
متن کاملMath Indexer and Searcher under the Hood: Fine-tuning Query Expansion and Unification Strategies
This paper summarizes the experience of Math Information Retrieval team of Masaryk University (MIRMU) with the NTCIR-12 MathIR arXiv Main Task and its subtasks. We based our approach on the MIaS system. Based on NTCIR-11 Math-2 Task relevance judgements, we developed an evaluation platform. Using this platform we rigorously evaluated combinations of new features and picked the most promising on...
متن کامل